Comprehensive Graphical Testing for Exponentially Increasing Data

JSM 2023, Toronto, CA

Emily Robinson

Cal Poly - San Luis Obispo

Reka Howard

University of Nebraska - Lincoln

Susan VanderPlas

University of Nebraska - Lincoln

Outline

Motivation and Background

Comprehensive Graphical Testing

Research Goals

Perception through Lineups

Prediction through ‘You Draw It’

Numerical Translation and Estimation

Overall Conclusions and Discussion

Exponential Growth

Von Bergmann (2021)

Benefits and pitfalls of log scales

Burn-Murdoch et al. (2020)

Testing statistical graphics

Evaluate design choices and understand cognitive biases through the use of visual tests.

Could ask participants to:

  • identify differences in graphs.
  • read information off of a chart accurately.
  • use data to make correct real-world decisions.
  • predict the next few observations.

All of these types of tests require different levels of use and manipulation of the information presented in the chart.

Task complexity

Carpenter and Shah (1988) identifies pattern recognition, interpretative processes, and integrative processes as strategies and processes required to complete tasks of varying degrees of complexity.

  • Pattern recognition requires the viewer to encode graphic patterns.

  • Interpretive processes operate on those patterns to construct meaning.

  • Integrative processes then relate the meanings to the contextual scenario as inferred from labels and titles.

Research objectives

Big Idea: Are there benefits to displaying exponentially increasing data on a log scale rather than a linear scale?

  1. Perception through Lineups – tests an individual’s ability to perceptually differentiate exponentially increasing data with differing rates of change on both the linear and log scale.

  2. Prediction with ‘You Draw It’ – tests an individual’s ability to make predictions for exponentially increasing data.

  3. Estimation by Numerical Translation – tests an individual’s ability to translate a graph of exponentially increasing data into real value quantities.

The series of graphical tests were conducted through an RShiny application found at https://emily-robinson.shinyapps.io/perception-of-statistical-graphics-log/.

Perception through lineups

Lineup experimental task

Study Participant Prompt: Which plot is most different?

Lineup study design

Lineup results

Prediction through ‘You Draw It’

‘You Draw It’ experimental task

Study Participant Prompt: Use your mouse to fill in the trend in the yellow box region.

‘You Draw It’ study design

Low Growth Rate

50% Truncation

Low Growth Rate

75% Truncation

High Growth Rate

50% Truncation

High Growth Rate

75% Truncation

‘You Draw It’ prediction results

‘You Draw It’ prediction results

Numerical Translation and Estimation

Integrative processes

Graph Comprehension

The three behaviors related to graph comprehension involve (Curcio 1987; Friel, Curcio, and Bright 2001; Glazer 2011; Jolliffe 1991; Wood 1968)

  1. literal reading of the data (elementary level)
  2. reading between the data (intermediate level)
  3. reading beyond the data (advanced level).

Estimation Biases

Questioning

Question type Tribble scenario Ewok scenario
Open Ended Between stardates 4530 and 4540, how does the population of Tribbles change? Between 30 and 40 ABY, how does the population of Ewoks change?
Elementary Q1 What is the population of Tribbles in stardate 4510? What is the population of Ewoks in 10 ABY?
Elementary Q2 In what stardate does the population of Tribbles reach 4,000? In what ABY does the population of Ewoks reach 4,000?
Intermediate Q1 From 4520 to 4540, the population increases by ____ Tribbles. From 20 ABY to 40 ABY, the population increases by ____ Ewoks.
Intermediate Q2 How many times more Tribbles are there in 4540 than in 4520? How many times more Ewoks are there in 40 ABY than in 20 ABY?
Intermediate Q3 How long does it take for the population of Tribbles in stardate 4510 to double? How long does it take for the population of Ewoks in 10 ABY to double?

Simulated Data

Estimation strategy

Estimation results

Estimation of population (Elementary Q1)

“What is the population in year 10?”

Estimation of population (First level estimates)

Estimation of population (Scratchwork)

Participant calculations and scratch work provides support that participants equated halfway spatially as halfway numerically.

2048 − 1024 = 1024

1024/2 = 512

512 + 1024 = 1536

32768 − 16384 = 16384

32768 − 16384 = 16384

16384 ∗ 2 = 32768

16384/2 = 8192

8192 + 16384 = 24576.

Additive increase in population (Intermediate Q1)

From 20 to 40, the population increases by ____ [creatures].

Multiplicative change in population (Intermediate Q2)

How many times more [creatures] are there in 40 than in 20?

Misunderstanding of log logic

Misinterpretation and miscalculations in Intermediate Q1 and Q2

Additive Increase Incorrect Logic

2048 − 1024 = 1024

1024 + 512 = 1536

16384 − 1536 = 14848

14848/1536 = 9.67.


Multiplicative Change Incorrect Logic

16384 − 1536 = 14848

Main take-aways

  • Understanding log logic is difficult.

  • Accuracy greatly depends on the location of the value being estimated in relation to the magnitude.

  • Strong anchoring and rounding effects.

    • Participants were resistance to estimate between grid lines on the log scale.

    • Inaccurate representation of equating spatial distance to quantitative difference.

  • Inaccurate first level estimations can lead to consequences in estimations which require participants to make comparisons between two points.

  • Estimates were subjective to the simulated data set.

Conclusion and Discussion

Conclusions

1. Perception through Lineups

  • Perceptual differences result from the contextual appearance (depends on choice of scale) of the trends.

2. Prediction through ‘You Draw It’

  • Clear underestimation of forecasting trends with high exponential growth rates when participants were asked to make predictions on the linear scale.

3. Numerical Translation and Estimation

  • Log logic is difficult and that we often misinterpret and miscalculate multiplicative reasoning.
  • Estimation accuracy for small magnitudes was improved by the use of the log scale, but sacrifices in accuracy on the log scale became apparent as magnitudes increased leading to advantages on the linear scale.

Overall Recommendations

  • Perceptual advantages of the use of log scales due to the change in contextual appearance.

  • Our understanding of log logic is flawed when translating the information into context.

  • We recommend consideration of both user needs and graph specific tasks when presenting data on the log scale.

  • Caution should be taken when interpretation of large magnitudes is required, but advantages may appear when it is necessary to visually identify and interpret small magnitudes on the chart.

Future work

References

Beeby, AW, and HPJ Taylor. 1973. “How Well Can We Use Graphs.” The Communicator of Scientific and Technical Information 17: 7–11.
Burn-Murdoch, John, Caroline Nevitt, Cale Tilford, Andrew Rininsland, Joanna Kao, Oliver Elliott, Emma Lewis, Brooke Fox, and Martin Stabe. 2020. “Coronavirus Tracked: Has the Epidemic Peaked Near You?” Coronavirus Chart: See How Your Country Compares. Financial Times. https://ig.ft.com/coronavirus-chart/?areas=eur.
Carpenter, Patricia A, and Priti Shah. 1988. “A Model of the Perceptual and Conceptual Processes in Graph Comprehension,” 26.
Curcio, Frances R. 1987. “Comprehension of Mathematical Relationships Expressed in Graphs.” Journal for Research in Mathematics Education 18 (5): 382–93. https://doi.org/10.2307/749086.
Dunham, Penelope, and Alan Osborne. 1991. “Learning How to See: Students Graphing Difficulties.” Focus on Learning Problems in Mathematics 13 (4): 35–49.
Friel, Susan N., Frances R. Curcio, and George W. Bright. 2001. “Making Sense of Graphs: Critical Factors Influencing Comprehension and Instructional Implications.” Journal for Research in Mathematics Education 32 (2): 124–58. https://doi.org/10.2307/749671.
Glazer, Nirit. 2011. “Challenges with Graph Interpretation: A Review of the Literature.” Studies in Science Education 47 (2): 183–210. https://doi.org/10.1080/03057267.2011.605307.
Jolliffe, Flavia. 1991. “Assessment of the Understanding of Statistical Concepts.” In Proceedings of the Third International Conference on Teaching Statistics, 1:461–66.
Leinhardt, Gaea, Orit Zaslavsky, and Mary Stein. 1990. “Functions, Graphs, and Graphing: Tasks, Learning, and Teaching.” Review of Educational Research 60 (1): 1–64.
Myers, Robert. 1954. “Accuracy of Age Reporting in the 1950 United States Census.” Journal of the American Statistical Association 49 (268): 826–31.
Tan, Joseph, and Izak Benbasat. 1990. “Processing of Graphical Information: A Decomposition Taxonomy to Match Data Extraction Tasks and Graphical Representations.” Information Systems Research, 416–39.
Von Bergmann, Jens. 2021. “Xkcd_exponential: Public Health Vs Scientists.” mountainMath. GitHub. March 17, 2021. https://github.com/mountainMath/xkcd_exponential.
Wood, R. 1968. “Objectives in the Teaching of Mathematics.” Educational Research 10 (2): 83–98.

Emily A. Robinson

github.com/earobinson95

Appendix

Lineup GLMM

Define \(Y_{ijkl}\) to be the event that participant \(l\) correctly identifies the target plot for data set \(k\) with curvature \(j\) plotted on scale \(i\).

\[\text{logit }P(Y_{ijk}) = \eta + \delta_i + \gamma_j + \delta \gamma_{ij} + s_l + d_k\]

where

  • \(\eta\) is the baseline average probability of selecting the target plot.
  • \(\delta_i\) is the effect of the log/linear scale.
  • \(\gamma_j\) is the effect of the curvature combination.
  • \(\delta\gamma_{ij}\)is the two-way interaction effect of the scale and curvature.
  • \(s_l \sim N(0,\sigma^2_\text{participant})\), random effect for participant characteristics.
  • \(d_k \sim N(0,\sigma^2_{\text{data}})\), random effect for data specific characteristics.

We assume that random effects for data set and participant are independent.

Feedback data

For each participant, the final data set used for analysis contains:

  • \(x_{ijklm}\), \(y_{ijklm,drawn}\), and \(\hat y_{ijklm,NLS}\)

for:

  • growth rate \(i = 1,2\),
  • point truncation \(j = 1,2\),
  • scale \(k = 1,2\),
  • participant \(l = 1,...N_{participant}\), and
  • \(x_{ijklm}\) value \(m = 1, ...,4 x_{max} + 1\).

Vertical residuals between the drawn and fitted values were calculated as: + \(e_{ijklm,NLS} = y_{ijklm,drawn} - \hat y_{ijklm,NLS}\).

Prediction GAMM

Fit separate for each growth rate (low = 1, high = 2), the GAMM equations for residuals are given by:

\[\begin{equation} e_{1jklm,NLS} = \tau_{1jk} + s_{1jk}(x_{1jklm}) + p_{l} + s_{l}(x_{1jklm}) \end{equation}\] \[\begin{equation} e_{2jklm,NLS} = \tau_{2jk} + s_{2jk}(x_{2jklm}) + p_{l} + s_{l}(x_{2jklm}) \end{equation}\]

where

  • \(e_{\cdot jklm,NLS}\) is the residual between the drawn \(y\)-value and fitted \(y\)-value for the \(l^{th}\) participant, \(m^{th}\) increment, and \(\cdot jk^{th}\) treatment combination
  • \(\tau_{\cdot jk}\) is the intercept for the \(j^{th}\) point truncation, and \(k^{th}\) scale treatment combination
  • \(s_{\cdot jk}\) is the smoothing spline for the \(\cdot jk^{th}\) treatment combination
  • \(x_{\cdot jklm}\) is the \(x\)-value for the \(l^{th}\) participant, \(m^{th}\) increment, and \(\cdot jk^{th}\) treatment combination
  • \(p_{l} \sim N(0, \sigma^2_\text{participant})\) is the error due to the \(l^{th}\) participant’s characteristics
  • \(s_{l}\) is the random smoothing spline for the \(l^{th}\) participant.

Estimation results

Open ended

“Between year 30 and 40, how does the population of [creatures] change?”

Estimation of population (Elementary Q1)

“What is the population in year 10?”

Estimation of population (First level estimates)

Estimation of population (Scratchwork)

Participant calculations and scratch work provides support that participants equated halfway spatially as halfway numerically.

2048 − 1024 = 1024

1024/2 = 512

512 + 1024 = 1536

32768 − 16384 = 16384

32768 − 16384 = 16384

16384 ∗ 2 = 32768

16384/2 = 8192

8192 + 16384 = 24576.

Estimation of time (Elementary Q2)

“In what year does the population reach 4,000?”

Additive increase in population (Intermediate Q1)

From 20 to 40, the population increases by ____ [creatures].

Multiplicative change in population (Intermediate Q2)

How many times more [creatures] are there in 40 than in 20?

Misunderstanding of log logic

Misinterpretation and miscalculations in Intermediate Q1 and Q2

Additive Increase Incorrect Logic

2048 − 1024 = 1024

1024 + 512 = 1536

16384 − 1536 = 14848

14848/1536 = 9.67.


Multiplicative Change Incorrect Logic

16384 − 1536 = 14848

Time until population doubles (Intermediate Q3)

How long does it take for the population in year 10 to double?

:::